An Auto-tuning Jit Compiler for Accelerating Multiple Stencil Computations
نویسندگان
چکیده
We present a JIT compiler with auto-tuning capabilities fusing multiple stencil computations. Data arrays for scientific computing of image processing often exceed cache-memory size. To take advantage of spatial and temporal locality, a common method is to partition the images into tiling blocks for multicore architectures. In realistic scenarios, the multiple image algorithms, most of which are stencil computations, should be processed consecutively. We fuse those stencil codes on the tiled edge-aware image blocks for efficient parallel computing. In this work, multiple kernels are concatenated and blocking according to the images’ dimension and the cache memory size of current hardware. To avoid high load communication, a sufficient number of halo layers are added. One auto-tuning strategy can keep an optimal balance between block size and the number of halo layers. Compared to hand-optimized naïve schemes, experiment using our task parallel model achieves at least 50% improvement of system performance on three CPUs workstations.
منابع مشابه
PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations
PATUS is a code generation and auto-tuning framework for stencil computations targeted at modern multiand many-core processors, such as multicore CPUs and graphics processing units. Its ultimate goals are to provide a means towards productivity and performance on current and future multiand many-core platforms. The framework generates the code for a compute kernel from a specification of the st...
متن کاملModel-Driven Auto-Tuning of Stencil Computations on GPUs
Stencil computations are a class of algorithms which perform nearest-neighbor computation, often on a multi-dimensional grid. This type of calculation forms the basis for computer simulations across almost every field of science. The increasing computational speed of graphics processing units (GPUs) make their use for stencil computations an interesting goal. However, achieving highly efficient...
متن کاملAuto-tuning Parallel Programs at Compiler- and Application-Levels
Auto-tuning has recently received its fair share of attention from the High Performance Computing community. Most auto-tuning approaches are specialized to work either on specific domains dense/sparse linear algebra, stencil computations etc.; or only at certain stages of program execution compile-time, launch-time or run-time. Real scientific applications, however, demand a cohesive environmen...
متن کاملA Generalized Framework for Auto-tuning Stencil Computations
This work introduces a generalized framework for automatically tuning stencil computations to achieve superior performance on a broad range of multicore architectures. Stencil (nearest-neighbor) based kernels constitute the core of many important scientific applications involving block-structured grids. Auto-tuning systems search over optimization strategies to find the combination of tunable p...
متن کاملAuto-tuning the 27-point Stencil for Multicore
This study focuses on the key numerical technique of stencil computations, used in many different scientific disciplines, and illustrates how auto-tuning can be used to produce very efficient implementations across a diverse set of current multicore architectures.
متن کامل